Introduction
- TMLE is a general algorithm for the construction of double robust, semiparametric, efficient substitution estimators. TMLE allows for data-adaptive estimation while obtaining valid statistical inference.
- Although TMLE implemtation uses the G-computation estimand (G-formula). Briefly, the TMLE algorithm uses information in the estimated exposure mechanism P(A|W) to update the initial estimator of the conditional mean E\(_{0}\)(Y|A,W).
- The targeted estimates are then substituted into the parameter mapping. The updating step achieves a targeted bias reduction for the parameter of interest \(\psi(P_{0})\) (the true target parameter) and serves to solve the efficient score equation. As a result, TMLE is a double robust estimator.
- TMLE it will be consistent for \(\psi(P_{0})\) is either the conditional expectation E\(_{0}\)(Y|A,W) or the exposure mechanism P\(_{0}\)(A|W) are estimated consistently. When both functions are consistently estimated, the TMLE will be efficient in that it achieves the lowest asymptotic variance among a large class of estimators. These asymptotic properties typically translate into lower bias and variance in finite samples.(Bühlmann et al., 2016)
- The advantages of TMLE have been repeatedly demonstrated in both simulation studies and applied analyses.(Laan and Rose, 2011)
- The procedure is available with standard software such as the tmle package in R (Gruber and Laan, 2011).
Causal assumptions
Under the counterfactual framework, we have to consider the following assumptions to consider the estimate of the ATE as causal:
CMI or Randomization
(\(Y_{0},Y_{1}\perp\)A|W) of the binary treatment effect (A) on the outcome (Y) given the set of observed covariates (W), where W = (W1, W2, W3, … , Wk).
Positivity
a ϵ A: P(A=a | W) > 0
P(A=1|W=w) > 0 and P(A=0| W = w) > 0 for each possible w.
Consistency or SUTVA:
The observed outcome value, under the observed treatment, is equal to the counterfactual outcome corresponding to the observed treatment for identical independent distributed (i.i.d.) variables.
TMLE flow chart
Source : Mark van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental dataSpringer Series in Statistics, 2011 
Data generation
In R we create a function to generate the data with the input number of draws and the output the observed data (ObsData) plus the counterfactuals (Y1, Y0).
The observed data:
1. Y: mortality binary indicator (1 death, 0 alive) 2. A: binary treatment for emergency presentation at cancer diagnosis (1 EP, 0 NonEP)
3. W1: Gender (1 male; 0 female)
4. W2: Age at diagnosis (0 <65; 1 >=65)
4. W3: Cancer TNM classification (scale from 1 to 4)
5. W4: Comorbidities (scale from 1 to 5)
#install.packages("broom")
options(digits=3)
generateData <- function(n){
w1 <- rbinom(n, size=1, prob=0.5)
w2 <- rbinom(n, size=1, prob=0.65)
w3 <- round(runif(n, min=0, max=4), digits=3)
w4 <- round(runif(n, min=0, max=5), digits=3)
A <- rbinom(n, size=1, prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4))
Y <- rbinom(n, size=1, prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
# counterfactual
Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
# return data.frame
data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
set.seed(7858)
ObsData <- generateData(n=1000)
True_Psi <- mean(ObsData$Y.1-ObsData$Y.0);
cat(" True_Psi:", True_Psi)
True_Psi: 0.24
Bias_Psi <- lm(data=ObsData, Y~ A)
cat("\n")
cat("\n Naive_Biased_Psi:",summary(Bias_Psi)$coef[2, 1])
Naive_Biased_Psi: 0.21
Data visualization
# DT table = interactive
# install.packages("DT") # install DT first
library(DT)
datatable(head(ObsData, n = nrow(ObsData)), options = list(pageLength = 5, digits = 2))
TMLE implementation
1st step: E\(_{0}\)(Y|A,W)
Estimation of the initial probability of the outcome (Y) given the treatment (A) and the set of covariates (W), denoted as the \(Q_{0}\)(A,W). To estimate \(Q_{0}\)(A,W) we can use a standard logistic regression model:
\(logit[P(Y=1|A,W)]\,=\,\beta_{0}\,+\,\beta_{1}A\,+\,\beta_{2}^{T}\)W.
Therefore, we can estimate the initial probability (as follows: . (1) The predicted probability can be estimated using the Super Learner library implemented in the R package “Super-Learner”6 to include any terms that are functions of A or W (e.g., polynomial terms of A and W, as well as the interaction terms of A and W, can be considered). Consequently, for each subject, the predicted probabilities for both potential outcomes and can be estimated by setting A = 0 and A = 1 for everyone respectively: and,.
Thank you
Thank you for participating in this tutorial.
If you have updates or changes that you would like to make, please send me a pull request. Alternatively, if you have any questions, please e-mail me.
Miguel Angel Luque Fernandez
E-mail: miguel-angel.luque at lshtm.ac.uk
Twitter @WATZILEI
Session Info
devtools::session_info()
References
Bühlmann P, Drineas P, Laan M van der, Kane M. (2016). Handbook of big data. CRC Press.
Greenland S, Robins JM. (1986). Identifiability, exchangeability, and epidemiological confounding. International journal of epidemiology 15: 413–419.
Gruber S, Laan M van der. (2011). Tmle: An r package for targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series.
Laan M van der, Rose S. (2011). Targeted learning: Causal inference for observational and experimental data. Springer Series in Statistics.
---
title: "TMLE step by step"
author: 'By: Miguel Angel Luque Fernandez, miguel-angel.luque@lshtm.ac.uk'
date: "October 15th, 2016"
output:  
  html_notebook:
    code_folding: show
    highlight: default
    #keep_md: yes
    number_sections: yes
    theme: journal
    toc: yes
    toc_float:
      collapsed: no
      smooth_scroll: yes
      toc_depth: 3
csl: references/isme.csl
bibliography: references/bibliography.bib
font-import: http://fonts.googleapis.com/css?family=Risque
font-family: 'Risque'
---

<!--BEGIN:  Set the global options and load packages-->
```{r set-global-options, echo = FALSE}
knitr::opts_chunk$set(eval = TRUE, 
                      echo = TRUE, 
                      cache = FALSE,
                      include = TRUE,
                      collapse = FALSE,
                      dependson = NULL,
                      engine = "R", # Chunks will always have R code, unless noted
                      error = TRUE,
                      fig.path="Figures/",  # Set the figure options
                      fig.align = "center", 
                      fig.width = 7,
                      fig.height = 7)
```

#Introduction
1. **TMLE** is a general algorithm for the construction of double robust, semiparametric, efficient substitution estimators. TMLE allows for data-adaptive estimation while obtaining valid statistical inference. 
2. Although **TMLE** implemtation uses the G-computation estimand (G-formula). Briefly, the TMLE algorithm uses information in the estimated exposure mechanism P(A|W) to update the initial estimator of the conditional mean E$_{0}$(Y|A,W). 
3. The targeted estimates are then substituted into the parameter mapping. The updating step achieves a targeted bias reduction for the parameter of interest $\psi(P_{0})$ (the true target parameter) and serves to solve the efficient score equation. As a result, TMLE is a **double robust estimator**.
4. **TMLE** it will be consistent for $\psi(P_{0})$ is either the conditional expectation E$_{0}$(Y|A,W) or the exposure mechanism P$_{0}$(A|W) are estimated consistently. When both functions are consistently estimated, the **TMLE** will be efficient in that it achieves the lowest asymptotic variance among a large class of estimators. These asymptotic properties typically translate into lower bias and variance in finite samples.[@buh2016]
5. The advantages of TMLE have been repeatedly demonstrated in both simulation studies and applied analyses.[@van2011]
6. The procedure is available with standard software such as the **tmle** package in R [@gruber2011].

#The G-Formula
1. $\psi(P_{0})\,=\,\sum_{w}\,\left[\sum_{y}\,P(Y=y\mid A=1,W=w)-\,\sum_{y}\,P(Y = y\mid A=0,W=w)\right]P(W=w)$  
where  
$P(Y = y \mid A = a, W = w)\,=\,\frac{P(W = w, A = a, Y = y)}{\sum_{y}\,P(W = w, A = a, Y = y)}$  
is the conditional probability distribution of Y = y, given A = a, W = w and,  
$P(W = w)\,=\,\sum_{y,a}\,P(W = w, A = a, Y = y)$  
2. Using classical regression methods to control confounding requires making the assumption that the effect measure is constant across levels of confounders included in the model.
3. Alternatively, **standardization** allows us to obtain an unconfounded summary effect measure without requiring this assumption.The **G-formula** is a *generalization of standardization*[@robins1986]

#Causal assumptions 
Under the counterfactual framework, we have to consider the following assumptions to consider the estimate of the ATE as causal: 

##CMI or Randomization 
($Y_{0},Y_{1}\perp$A|W) of the binary treatment effect (A) on the outcome (Y) given the set of observed covariates (W), where W = (W1,  W2, W3, … , Wk). 

##Positivity 
a ϵ A: P(A=a | W) > 0  
P(A=1|W=w) > 0 and P(A=0| W = w) > 0 for each possible w.  

##Consistency or SUTVA: 
The observed outcome value, under the observed treatment, is equal to the counterfactual outcome corresponding to the observed treatment for identical independent distributed (i.i.d.) variables.    

#TMLE flow chart 
**Source** :	Mark van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental dataSpringer Series in Statistics, 2011
![](Figures/tmle.png)

#Data generation
In R we create a function to generate the data with the input number of draws and the output the observed data (ObsData) plus the counterfactuals (Y1, Y0).   
The observed data:  
1. Y: mortality binary indicator (1 death, 0 alive) 
2. A: binary treatment for emergency presentation at cancer diagnosis  (1 EP, 0 NonEP)    
3. W1: Gender (1 male; 0 female)  
4. W2: Age at diagnosis (0 <65; 1 >=65)  
4. W3: Cancer TNM classification (scale from 1 to 4)  
5. W4: Comorbidities (scale from 1 to 5)  

```{r}
#install.packages("broom")
options(digits=3)
generateData <- function(n){
  w1 <- rbinom(n, size=1, prob=0.5)
  w2 <- rbinom(n, size=1, prob=0.65)
  w3 <- round(runif(n, min=0, max=4), digits=3)
  w4 <- round(runif(n, min=0, max=5), digits=3)
  A  <- rbinom(n, size=1, prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4))
  Y  <- rbinom(n, size=1, prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
  
  # counterfactual
  Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
  Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
  
  # return data.frame
  data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
set.seed(7858)
ObsData <- generateData(n=1000)
True_Psi <- mean(ObsData$Y.1-ObsData$Y.0);
cat(" True_Psi:", True_Psi)
Bias_Psi <- lm(data=ObsData, Y~ A)
cat("\n")
cat("\n Naive_Biased_Psi:",summary(Bias_Psi)$coef[2, 1])
```

#Data visualization
```{r}
# DT table = interactive
# install.packages("DT") # install DT first
library(DT)
datatable(head(ObsData, n = nrow(ObsData)), options = list(pageLength = 5, digits = 2))
```

#TMLE implementation

##1st step: E$_{0}$(Y|A,W)  
Estimation of the initial probability of the outcome (Y) given the treatment (A) and the set of covariates (W), denoted as the $Q_{0}$(A,**W**). To estimate $Q_{0}$(A,**W**) we can use a standard logistic regression model: 

$logit[P(Y=1|A,W)]\,=\,\beta_{0}\,+\,\beta_{1}A\,+\,\beta_{2}^{T}$**W**.    

Therefore, we can estimate the initial probability (as follows:  .      (1)
The predicted probability can be estimated using the Super Learner library implemented in the R package “Super-Learner”6 to include any terms that are functions of A or W (e.g., polynomial terms of A and W, as well as the interaction terms of A and W, can be considered). Consequently, for each subject, the predicted probabilities for both potential outcomes  and  can be estimated by setting A = 0 and A = 1 for everyone respectively:
 and,.



# Thank you  
Thank you for participating in this tutorial.  
If you have updates or changes that you would like to make, please send <a href="https://github.com/migariane/MALF" target="_blank">me</a> a pull request.
Alternatively, if you have any questions, please e-mail me.  
**Miguel Angel Luque Fernandez**  
**E-mail:** *miguel-angel.luque at lshtm.ac.uk*  
**Twitter** `@WATZILEI`  

# Session Info 
```{r session-info, results ='markup'}
devtools::session_info()
```
# References 
